A Privacy Data-oriented Hierarchical MapReduce Programming Model
نویسندگان
چکیده
To realize privacy data protection efficiently in hybrid cloud service, a hierarchical control architecture based multi-cluster MapReduce programming model (the Hierarchical MapReduce Model, HMR) is presented. Under this hierarchical control architecture, data isolation and placement among private cloud and public clouds according to the data privacy characteristic is implemented by the control center in private cloud. And then, to perform the corresponding distributed parallel computation correctly under the multi-clusters mode that is different to the conventional single-cluster mode, the Map-ReduceGlobalReduce three stage scheduling process is designed. Limiting the computation about privacy data in private cloud while outsourcing the computation about non-privacy data to public clouds as much as possible, HMR reaches the performance of both security and low cost.
منابع مشابه
Parallelizing K-Anonymity Algorithm for Privacy Preserving Knowledge Discovery from Big Data
Disclosure control has become inevitable as privacy is given paramount importance while publishing data for mining. The data mining community enjoyed revival after Samarti and Sweeney proposed k-anonymization for privacy preserving data mining. The k-anonymity has gained high popularity in research circles. Though it has some drawbacks and other PPDM algorithms such as l-diversity, t-closeness ...
متن کاملPrivacy-Preserving Secret Shared Computations using MapReduce
Data outsourcing allows data owners to keep their data at untrusted clouds that do not ensure the privacy of data and/or computations. One useful framework for fault-tolerant data processing in a distributed fashion is MapReduce, which was developed for trusted private clouds. This paper presents algorithms for data outsourcing based on Shamir’s secret-sharing scheme and for executing privacy-p...
متن کاملSecurity and Privacy Aspects in MapReduce on Clouds: A Survey
MapReduce is a programming system for distributed processing large-scale data in an efficient and fault tolerant manner on a private, public, or hybrid cloud. MapReduce is extensively used daily around the world as an efficient distributed computation tool for a large class of problems, e.g., search, clustering, log analysis, different types of join operations, matrix multiplication, pattern ma...
متن کاملRepresenting MapReduce Optimisations in the Nested Relational Calculus
The MapReduce programming model is recently getting a lot of attention from both academic and business researchers. Systems based on this model hide communication and synchronization issues from the user and allow processing of high volumes of data on thousands of commodity computers. In this paper we are interested in applying MR to processing hierarchical data with nested collections such as ...
متن کاملScather: programming with multi-party computation and MapReduce
We present a prototype of a distributed computational infrastructure, an associated highlevel programming language, and an underlying formal framework that allow multiple parties to leverage their own cloud-based computational resources (capable of supporting MapReduce [27] operations) in concert with multi-party computation (MPC) to execute statistical analysis algorithms that have privacy-pre...
متن کامل